SemanticScuttle - klotz.me » Tags: machine learning+llm+inference

Tags: machine learning* + llm* + inference*

0 bookmark(s) - Sort by: Date ↓ / Title /

OpenAI releases gpt-oss-120b and gpt-oss-20b, two state-of-the-art open-weight language models that deliver strong real-world performance at low cost. They outperform similarly sized open models on reasoning tasks and are optimized for efficient deployment.

2025-08-06 Tags: gpt-oss, open-weight models, llm, reasoning, openai, o3, o4-mini, machine learning, inference by klotz

El Reg's essential guide to deploying LLMs in production

Running GenAI models is easy. Scaling them to thousands of users, not so much. This guide details avenues for scaling AI workloads from proofs of concept to production-ready deployments, covering API integration, on-prem deployment considerations, hardware requirements, and tools like vLLM and Nvidia NIMs.

2025-04-28 Tags: llm, ai, production engineering, inference engineering, deployment, vllm, nvidia, kubernetes, inference, api, scaling, gpu, machine learning by klotz

Primer LLM Embedding

This Space demonstrates a simple method for embedding text using a LLM (Large Language Model) via the Hugging Face Inference API. It showcases how to convert text into numerical vector representations, useful for semantic search and similarity comparisons.

2025-03-28 Tags: llm, embedding, hugging face, inference, api, semantic search, vector representation, text embedding by klotz

NVIDIA DGX Spark

NVIDIA DGX Spark is a desktop-friendly AI supercomputer powered by the NVIDIA GB10 Grace Blackwell Superchip, delivering 1000 AI TOPS of performance with 128GB of memory. It is designed for prototyping, fine-tuning, and inference of large AI models.

2025-03-24 Tags: machine learning, nvidia, dgx spark, llm, grace blackwell, ai development, inference, data science, gpu, cpu by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

Tags: machine learning* + llm* + inference*

Linked Tags

Related Tags